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METHOD AND SYSTEM FOR PARAMETRIC CHARACTERIZATION 
OF TRANSIENT AUDIO SIGNALS 

FIELD OF THE INVENTION 

5 

The present invention relates to methods and systems for parametric characterisation and 
modelling of transient audio signals for encoding thereof. This invention is applicable in 
the area of digital audio compression at very low bit-rates. 



1 0 BACKGROUND OF THE INVENTION 

The MPEG-4 parametric audio coding tools 'Harmonic and Individual Lines plus Noise 1 
(HILN) permit coding of general audio signals at bit-rates of 4 kbps and above using a 
parametric representation of the audio signals (please see Heiko Purnhagen, HILN- The 

15 MPEG-4 Parametric Audio Coding Tools, IEEE International Conference on Circuits and 
Systems, May 2000 and Heiko Purnhagen, Advances in Parametric Audio Coding, IEEE 
Workshop on Applications of Signal Processing to Audio and Acoustics, Oct. 1999). 
Figure I shows a block diagram of a HILN parametric audio encoder. The input signal is 
first decomposed into different components and then the model parameters for the 

20 components' source models are estimated such that: 

• An individual sinusoid is described by its frequency and amplitude. 

• A harmonic tone is described by its fundamental frequency, amplitude and the spectral 
envelope of its partial harmonics. 

25 • A noise_s\gna\ is described by its amplitude and spectral envelope. 



Due to the very low target bit rates (e.g. 6-16 kbps), only the parameters for a small 
number of components can be transmitted. Therefore a perception model is employed to 
select those components that are most important for the perceptual quality of the signal. 
30 The quantization of the selected components is also done using the perceptual importance 
criteria. 
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A slightly different approach was adapted by Goodwin (M. Goodwin, Adaptive Signal 
Models: Theory, Algorithm and Audio Applications, PhD thesis, University of California, 
Berkeley, 1997) for the atomic decomposition of audio signals. Consider an additive 
5 signal model of the form: 

/ 

wherein a signal is represented as a weighted sum of basic components (gi[n]). These 
10 building blocks or basic components are picked from an existing dictionary of many such 
components. Being over-complete, it is possible to represent the same signal with non- 
identical sets of basic components. The representation set chosen must be the one in which 
there are the fewest number of basic components. This is the concept of compact 
representation, and is the theme behind most advanced signal representation techniques 
15 such as wavelets. The traditional transform coders that use a set of complex exponentials 
(analogous to words in the dictionary) as the basis for encoding input signals are complete. 
Therefore there is only one possible representation of enclosed signal because there is a 
unique Fourier Transform for a given signal. In the over-complete case, more than one 
representation is possible, and an efficient coding scheme attempts to determine which is 
20 moat compact. 

Sinusoidal modelling is suited best for stationary tonal signals. Transient signals (such as 
beats) can be modeled well only by using a large number of such sinusoids with the 
original phase preserved, as presented by Purnhagen in Advances in Parametric Audio 
25 Coding. This is certainly not a compact representation of transient signals. 



Goodwin [M. Goodwin, Matching Pursuit with Damped Sinusoids, IEEE International 
Conference on Acoustics, Speech and Signal Processing, 1997] recommended the scheme 
of damped sinusoids to model transients. However, his approach of matching pursuit is 
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relatively computationally expensive. It is desired to provide a simpler approach that 
produces good results. 

Moreover, the general thinking seems to be that the decay in the transient signal is 
5 modelled as a single exponential. Figure 2 shows, however, that the envelope generated by 
the single exponential has significant error relative to the true envelope. Accordingly, the 
single exponential model is not desirably accurate. For a small increase in the number of 
parameters, it is possible to be more accurate about the exact nature of the decay function. 

10 SUMMARY OF THE INVENTION 

The present invention provides a method of parametricly encoding a transient audio signal, 
including the steps of: 

(a) determining a set of frequency values V of the N largest frequency 
15 components of the transient audio signal, where N is a predetermined number; 

(b) determining an approximate envelope of the transient audio signal; and 

(c) determining a predetermined number P of amplitude values of W of 
samples of the approximate envelope for use in generating a spline approximation of the 
approximate envelope; 

20 whereby a parametric representation of the transient audio signal is given by 

parameters including V, N, P and W, such that a decoder receiving the parametric 
representation can reproduce a decoder approximation of the transient audio signal. 

Preferably, the method further includes the steps of: 
25 (d) generating a spline approximation of the approximate envelope using a 

spline interpolation function and the predetermined number P of samples W; 

(e) generating an encoder-side approximation of the transient audio signal 
based on the spline approximation and the parameters V, N, P and W; 

(f) determining energy levels of the encoder-side approximation and the 
30 transient audio signal, respectively; and 

(g) determining a scaling factor as a function of the energy levels of the 
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encoder-side approximation and the transient audio signal for scaling the received 
approximation to match an energy level thereof with the energy level of the transient audio 
signal. 

5 Preferably, the spline interpolation function is a cubic spline interpolation function. 
Preferably, N is determined according to a bit rate of an audio encoder performing the 
method. 

Preferably, step (a) includes determining frequency components of the transient audio 
10 signal by performing a fast Fourier transform thereof and selecting the N largest frequency 
components of the determined frequency components. Preferably, step (b) includes 
determining an absolute value version of the transient audio signal and low pass filtering 
the absolute value version to generate an envelope. Preferably, the method further includes 
scaling the decoder approximation to match an energy level thereof with an energy level of 
1 5 the transient audio signal. 

One aspect of the invention provides an encoder adapted to perform the method as 
described above. Another aspect of the invention provides a decoder adapted to decode a 
signal having a transient audio signal encoded according to the method described above. 

20 

The present invention further provides a system for parametriciy encoding a transient 
audio signal, the system including: 

means for determining a set of frequency values V of the N largest frequency 
components of the transient audio signal, where N is a predetermined number; 
25 means for determining an approximate envelope of the transient audio signal; 

means for determining a predetermined number P of amplitude values W of 
samples of the approximate envelope for use in generating a spline approximation of the 
approximate envelope; 

means for transmitting a parametric representation of the transient audio signal 
30 comprising parameters including V, N, P and W, such that a decoder receiving the 

parametric representation can reproduce a decoder approximation of the transient audio 
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signal. 



The present invention provides an improvement on the method of damped sinusoids. 
Instead of modeling the damping simply as an exponential (e" 1 ™) with parameter k y we first 
5 derive a smooth envelope of the signal and then subsequently use spline interpolation 
functions (preferably cubic) to approximate the envelope of the transient audio signal. 

In the matching pursuit algorithm proposed by Goodwin, damped sinusoids are matched 
against the residue signal in an iterative manner. In the present approach, a set of N 
10 highest un-damped sinusoids (which are found directly from the spectrum of the signal) are 
used to generate an approximation of the transient signal and then a cubic-spline 
interpolated envelope is imposed onto the sinusoids. Therefore the present approach is 
much simpler. 

15 The transient modeling begins with the classification of a segment of an audio signal (of 
length, say I) as transient. Thereafter the following steps are performed: 



1. 



Compute the Fast Fourier Transform of the segment x[n], to determine the 
frequency coefficients X[k]: 



20 




k=0... 1/2-1 



2. 



Form a set V of N indices such that: for each v eV, 0<=v <I/2 and BX[v]ll >= 



ltX[w]ll, where w eV. In other words, V contains those indices that 



25 



correspond to the N largest frequency components. 



3. 



The first approximation of the signal x[n] is: 
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where X[k] are frequency coefficients of x[n] for k = 1,2,..,,N. 

4. Derive a new signal x abs [n] = (bc[n]ll. Perform a low-pass filtering of the 

signal x abs [n] with the filter H(z)=l+z l +z 2 ....z M 9 where M is the order of 
5 the filter plus one. 

5. The resultant filtered signal x <iw [n] is taken as a good approximation of the 
envelope of signal x[n]. 

10 6. Using P equidistant points W on x C nv[n], perform a cubic-spline 

interpolation to derive an approximation s[n] of the signal envelope. 

7. Impose the spline onto the approximate signal x[n]J.e. y[n] = x[n] * s[n] . 

15 8. Compute a scale-factor a to match the energy of the reconstructed signal 

with the original signal. 

9. The parameters describing the transient x[n] are then: I, V, X[k] (for each 
keV), W and a. 



20 



Advantageously, embodiments of the invention enable the transient audio signal to be 
more accurately reproduced at the decoder side. 



25 



BRIEF DESCRIPTON OF THE DRAWINGS 

Figure 1 is a block diagram of the HILN parametric audio encoder model; 
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Figure 2 is a comparative plot, showing the absolute value of a transient signal, its 
approximate envelope and the closest exponential decay function approximating the decay 
of the transient audio signal over time; 

5 Figure 3 shows an example of a transient audio signal, x[n]; 

Figure 4(a) shows the transient audio signal of Figure 3; Figures 4(b), (c) and (d) show 
progressive summing of sinusoidal signals to arrive at a modelled version of the transient 
audio signal in Figure 4(e); 

10 

Figure 5 shows comparative plots of the original transient audio signal, an absolute value 
version thereof and an envelope thereof; 

Figure 6 is a plot of the envelope shown in Figure 5, with a cubic spline approximation of 
15 the envelope overlayed thereon; 

Figure 7 shows the plots of Figures 4(b), (c), (d) and (e), but with the cubic spline-derived 
envelope imposed thereon, resulting in plots 7(a), (b), (c) and (d); 

20 Figure 8 is a block diagram of an improved HILN model encoder according to an 
embodiment of thm invention; and 

Figure 9 is a block diagram of a decoder according to another embodiment of the 
invention. 

25 

A detailed description of preferred embodiments of the invention is hereinafter provided, 
by way of example only, with reference to the accompanying drawings. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 



30 
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Consider a segment of audio signal that has been classified as transient. Several 
approaches exist for detecting a transient, the most popular one being the Spectral Flatness 
Measure or SFM. In the SFM method, the ratio of the geometric mean to the arithmetic 
mean of the spectral values is computed. A high SFM ratio implies a flatter spectrum and 
5 is more akin to an attack or transient. Smooth periodic signals, which are predominantly 
composed of a fundamental frequency and a few harmonics, result in a spiky spectrum and 
a small SFM value. 

Figure 3 shows the time domain samples of a castanet, which is a classic example of a 
10 transient-type signal. Before the onset of the transient is a period of quiet, and after a very 
brief period of pseudo-periodic activity (transient), the music decays quickly in a 
somewhat exponential manner. 

In order to parameterize this transient signal, we need to identify the basic atoms that 
15 constitute this signal. In Goodwin's approach, one would seek to identify damped 
sinusoids (each with an amplitude, frequency and decay factor) the sum of which form a 
close approximation of the given signal. As mentioned, this approach is quite 
computationally expensive. In the present approach, a Discrete Fourier Transform or its 
faster equivalent, the Fast Fourier Transform (FFT), is first used to determine the main 
20 frequency components of the signal. Let X[k] be the frequency coefficients obtained after 
performing an FFT on signal x[n]. 

25 Next we construct a set V of indices in the following manner. Choose k| such that llX[ki]B 
has the largest value over all k=0... 1/2-1 for a signal interval I. Add ki to V. Now choose 

k2 such that ltX[k2]U has the largest value (excluding ki). Continue in this manner to add 
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indices to V. The number N of elements in V depends on the compression rate (the lower 
the bit-rate, the fewer the elements). An approximation of the signal x[n] is given by: 

x[n] = g^«fll(Jf[t])co^^j - imag(X[k)sin^^^ 

5 

This approximation is used on the decoder side to reconstruct the original transient signal 
from its major constituent frequency components. The reconstruction accuracy depends on 
the number of elements in V. However, for very low bit-rates, not many components can 
be transmitted. 

10 

Figure 4 shows the reconstruction of x[n] using the above principle. Plot (a) shows the 
original transient signal. Plots (b), (c), (d) show the progressive summing of sinusoidal 
signals to arrive at an approximation of the original signal, shown as plot (e). Note the 
considerable ringing in the latter part of the reconstructed signal in plot (e). This ringing is 
15 undesirable as it introduces an additional damping effect which reduces the sharpness of 
the reproduced transient signal. With the three sinusoids summed as illustrated in Figure 
4, a rough approximation of the transient is obtained. However, a considerable problem is 
that the reconstructed signal does not decay as much as the original, due to the ringing. 
Therefore the next step is to approximate the decay function. 

20 

To model the decay function, an envelope of the signal must be determined. A reasonable 
way of obtaining the envelope is proposed here. Given the signal x[n], an absolute 

magnitude version of the signal, x ab s[n]HIx[n]ll is derived. Following this, a low pass 

filtering of the absolute signal x abs [n] with the filter H(z)=l+ z ! +z" 2 z M is performed, 

25 where M is the order of the filter plus one. The low pass filtering removes short-term 
fluctuations and so generates a kind of envelope Xc n v[n] of the signal. Figure 5 shows plots 
of x abs [n] and x cnv [n] obtained from example signal x[n]. The filter used to generate x cn v[n] 
in Figure 5 is of order 20 (M=21). 
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The purpose here is to parameterize the envelope so that it can be described to the decoder 
at the receiver with few parameters. Therefore the objective is to model the envelope 
obtained through low pass filtering of the signal accurately and yet in a compact form. 
Traditionally an exponential decay factor would be determined. However, since that is not 
5 quite accurate, a more sophisticated method is used here employing cubic^spline functions. 

In order to interpolate the envelope using a spline function, it is necessary to determine the 
sample points between which the envelope is to be interpolated. This is done by taking a 
predetermined number P of samples W over the interval I of the transient signal. The 
10 samples W are equally spaced over time within the interval I and include the first and last 
samples thereof. The number P of samples W is determined, as an operational parameter, 
depending on the desired decoder reproduction accuracy. In the example shown in Figure 
6, P is 9. 

15 Spline functions are important and powerful tools for a number of approximation tasks 
such as interpolation, data fitting and the solution of boundary value problems for 
differential equations. 

In general, given sample points {Xj} H Jw09 a function s belongs to the set S m (x Q , 9 x n ) of 

20 spline functions of degree m over (n+1) points xo,...x n if 

1. s is a polynomial of degree at-most m in each of the intervals 
]-oo,x 0 [x 0 ,x,[,...,K,oo[. 

25 2. s and its first m-1 derivatives vary continuously over the points xo,. . .,x n 

[ 1 = 0,1,.../! 

Generally, s is a piecewise polynomial, i.e. a new polynomial in each sub-interval, and 
30 these polynomials are glued together. Since any two adjacent ones of these piecewise 
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polynomials and their first m-t derivatives s (p) (.) vary continuously at the intervals, the 
overall effect is a virtually smooth continuous function. The value of m can be as large as 
necessary, however m=3 (cubic) is preferably used here since this degree gives a 
sufficiently smooth curve. Figure 6 shows a spline-derived envelope approximation (C) of 
5 x C nv[n] constructed using nine equidistant points (W) on the envelope x cn v[n]. 

Imposing the spline function s[n] over the previously reconstructed transient signal x[n], a 
better approximation y[n) = * s [n] of the original signal is obtained. This 
approximation is better because the sinusoids, as such, are not damped, but rather a spline 
10 function is used to shape the sinusoids according to the signal envelope. Finally, an 
amplitude adjustment (scale) factor a is used to adjust the energy of the reconstructed 
signal to that of the original signal. This adjustment is determined from the ratio between 
the energy of the original transient signal to that of the modelled transient signal at the 
encoder side signal. 

15 

Figure 8 is a block diagram of a model of an encoder 10 according to an embodiment of 
the invention. The encoder 10 improves on the standard HILN model by adding a signal 
envelope generation module 12 as part of the parameter estimation block. An additional 
quantizer 14 is provided at the output of the signal envelope generation module 12 as part 

20 of the parameter coding block, and the output of the quantizer 14 is fed into the 
multiplexer. The encoder 10 assumes detection of an interval of the audio signal as being 
transient, after which the signal interval is fed into the signal envelope generation module 
12 for parameterization thereof according to the method described above. A model based 
decomposition module 11 within the encoder 10 determines whether the incoming audio 

25 signal is to be classified as tonal, transient or noise, according to known methods, as well 
as determining<the fast fourier transform of the input audio signal. 

For the improved HILN model shown in Figure 8, parameter estimation is performed for 
harmonic components (block 15) and noise components (block 17), as well as sinusoidal 
30 components (block 16). Once the input audio signal is determined by the module based 
decomposition module 1 1 to be transient, parameter estimation of the harmonic and noise 
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components in blocks 15, 17 is not required. Sinusoidal components block 16 determines 
the N largest components (represented by the set V) of the input audio signal and these are 
passed through a quantizer to multiplexer 20. 

5 The signal envelope generation module 12 receives the input audio signal x [n] and 
determines the envelope thereof by low pass filtering an absolute value version of the input 
signal. The signal envelope generation module 12 then determines P equidistant points W 
on the envelope and determines a spline interpolation of the envelope based on those P 
points. The single envelope generation module 1 2 also computes the scale factor a, and 
10 the determined envelope parameters, including points W, are quantized and transmitted, 
along with the scale factor a, via multiplexer 20. This information, together with the N 
quantized values of set V transmitted through the sinusoidal components block 16, is used 
by the decoder (shown in Figure 9) to reconstruct the transient audio signal. 

15 Referring now to Figure 9, a decoder 40 is provided for receiving and decoding 
compressed audio data which has been encoded by the encoder 10 shown in Figure 8. The 
decoder 40 has a demultiplexer 50 for decompressing the received audio data and directing 
it to harmonic, sinusoidal and noise component decoder modules 55, 56 and 57 and to 
signal envelope reconstruction module 52. Alternatively, the compressed audio data may 

20 be decompressed in a separate step before it is received by the demultiplexer. The set V of 
N harmonics is used by the sinusoidal component module 56 to generate an approximation 
of the signal x A [n] according to step 3 above, thereby outputting an approximation x A [n], 

The signal envelope reconstruction module 52 receives the envelope information, 
25 including points W and scale factor a, to generate a scaled cubic spline function s[n] 
which, in combination with the signal approximation x A [n], is used to reconstruct the 
transient audio signal. The final reconstructed signal is represented by ax[n] * s[rt] . 

The steps and modules described herein and depicted in the drawings may be performed or 
30 constructed in either hardware or software or a combination of both, the implementation of 
which will be apparent to those skilled in the ait from the preceding description of the 
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invention and the drawings. Certain modifications may be made to the hereinbefore 
described embodiments of the invention without departing from the spirit and scope of the 
invention, and these will be apparent to persons skilled in the art. 
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CLAIMS: 

1 . A method of parametricly encoding a transient audio signal, including the steps of: 

(a) determining a set of frequency of values V of the N largest frequency 
5 components of the transient audio signal, where N is a predetermined number; 

(b) determining an approximate envelope of the transient audio signal; and 

(c) determining a predetermined number P of amplitudes values W of samples 
of the approximate envelope for use in generating a spline approximation of the 
approximate envelope; 

10 whereby a parametric representation of the transient audio signal is given by 

parameters including V, N, P and W, such that a decoder receiving the parametric 
representation can reproduce a decoder approximation of the transient audio signal. 

2. The method of claim 1, further including the steps of: 

15 (a) generating a spline approximation of the approximate envelope using a 

spline interpolation function and the amplitude values W; 

(b) generating an encoder approximation of the transient audio signal based on 
the spline approximation and the parameters V, N, P and W; 

(c) determining energy levels of the encoder-side approximation and the 
20 transient audio signal, respectively; and 

(d) determining a scaling factor as a function of the energy levels of the 
encoder approximation and the transient audio signal for scaling the decoder 
approximation to match an energy level of the decoder approximation with the energy 
level of the transient audio signal. 

25 

3. The method of claim 1 or 2, further including the step of transmitting the 
parametric representation of the transient audio signal via a communication medium. 

4. The method of claim 2, wherein the spline interpolation function is a cubic spline 
30 interpolation function. 
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5. The method of claim 1 or 2, wherein N is determined according to a bit rate of an 
audio encoder performing the method. 

6. The method of claim 1, wherein step (a) includes: 

5 determining frequency components of the transient audio signal by performing a 

fast Fourier transform thereof; and 

selecting the N largest frequency components of the determined frequency 
components. 

10 7. The method of claim 1, further including the step of determining an interval, I, of 
the transient audio signal and wherein the parameters of the parametric representation 
further include the interval, I. 

8. The method of claim 7, wherein the samples W are equally spaced in time over the 
15 interval I. 

9. The method of claim 1, wherein the received approximation of the transient audio 
signal x[n] is given by: 

20 x[n] = g^rea/(.V[A])cos^^y^j-- imag(X[k])s\n 

where X[k] are frequency coefficients of x[n] for k-l, 2, N; and 
I is the interval of the transient audio signal. 

10. The method of claim 1, wherein step (b) includes: 

25 determining an absolute value version x Q bs[n] of the transient audio signal x[n]; and 

low-pass filtering the absolute value version x tt b S [n] to generate the approximate 
envelope x tfnv [n]. 

11. An encoder adapted to perform the method of any one of claims 1 to 10. 




- 16- 



12. A decoder adapted to decode a signal having a transient audio signal encoded 
according to the method of any one of claims I to 10. 

13. A method of decoding a parametricly encoded transient audio signal, where the 

5 transient audio signal is encoded according to the method of any one of claims 1 to 10, the 
method of decoding including the steps of: 

(a) receiving the parametric representation; and 

(b) reproducing the decoder approximation of the transient audio signal 
according to the parameters of the parametric representation by: 

10 1) generating a sinusoidal signal by combining the set of frequency 

values V of the N largest frequency components of the transient audio signal; 

2) generating a spline approximation using a spline interpolation 
function and the amplitude values W; and 

3) applying the spline approximation to the sinusoidal signal. 

15 

14. The method of claim 13 when dependent upon claim 2, wherein the parameters 
include the scaling factor and the method of decoding further includes the step of: 

(a) scaling the energy level of the decoder approximation according to the 
scaling factor to match the energy level of the transient audio signal. 

20 

15. A decoder adapted to perform the method of claim 13 or 14. 

16. A system for parametricly encoding a transient audio signal, the system including: 
means for determining a set of frequency values V of the N largest frequency 

25 components of the transient audio signal, where N is a predetermined number; 

means for determining an approximate envelope of the transient audio signal; 

means for determining a predetermined number P of amplitude values W of 
samples of the approximate envelope for use in generating a spline approximation of the 
approximate envelope; 

30 means for transmitting a parametric representation of the transient audio signal 

comprising parameters including V, N, P and W, such that a decoder receiving the 
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parametric representation can reproduce a decoder approximation of the transient audio 
signal. 
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